Correcting real-word spelling errors by restoring lexical cohesion
نویسندگان
چکیده
Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words that would be related to the context. Relatedness to context is determined by a measure of semantic distance initially proposed by Jiang and Conrath (1997). We tested the method on an artificial corpus of errors; it achieved recall of 23 to 50% and precision of 18 to 25%. 1 Real-word spelling errors Conventional spelling checkers detect typing errors simply by comparing each token of a text against a dictionary of words that are known to be correctly spelled. Any token that matches an element of the dictionary, possibly after some minimal morphological analysis, is deemed to be correctly spelled; any token that matches no element is flagged as a possible error, with nearmatches displayed as suggested corrections. Typing errors that happen to result in a token that is a correctly spelled word, albeit not the one that the user intended, cannot be detected by such systems. Such errors are not uncommon; Mitton [1987, 1996] found that “real-word errors account for about a quarter to a third of all spelling errors, perhaps more if you include word-division
منابع مشابه
Design and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملAn Evaluation of the Contextual Spelling Checker
Microsoft Office Word 2007 includes a “contextual spelling checker” that is intended to find misspellings that nonetheless form correctly spelled words. In an evaluation on 1400 examples, it is found to have high precision but low recall — that is, it fails to find most errors, but when it does flag a possible error, it is almost always correct. However, its performance in terms ofF is inferior...
متن کاملEvaluating WordNet-based Measures of Lexical Semantic Relatedness
The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content–based measure proposed by Jiang and Conrath is found superior to those ...
متن کاملThree-Phase Text Error Correction Model for Korean SMS Messages
In this paper, we propose a three-phase text error correction model consisting of a word spacing error correction phase, a syllablebased spelling error correction phase, and a word-based spelling error correction phase. In order to reduce the text error correction complexity, the proposed model corrects text errors step by step. With the aim of correcting word spacing errors, spelling errors, a...
متن کاملCorrecting Different Types of Errors in Texts
This paper proposes an unsupervised approach that automatically detects and corrects a text containing multiple errors of both syntactic and semantic nature. The number of errors that can be corrected is equal to the number of correct words in the text. Error types include, but are not limited to: spelling errors, real-word spelling errors, typographical errors, unwanted words, missing words, p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Natural Language Engineering
دوره 11 شماره
صفحات -
تاریخ انتشار 2005